In recent years, the difference between more urbanised areas and less urbanised areas has become increasingly pronounced. For example, many highly educated and young people are moving to the big cities of the Netherlands, resulting in population shrinkage and ageing in less urbanised areas (CBS, 2016). The increasing contrast between the urbanised areas and less urbanised could have a large effect on the travel behaviour within these regions. In this report, the influence the grade of urbanisation in an area has on the travel behaviour of its residents is researched. Therefore our research question is the following:
How does the grade of urbanisation of an area affect the travel behaviour of its residents?
The research is conducted using a mobility dataset of the CBS. The dataset will be explained in more detail in the next paragraph. CBS defines the grade of urbanisation based on the amount of surrounding addresses per square kilometre. Before analysing the dataset, we stated the following hypotheses for our research question:
Firstly, the dataset used will be further explained. After this, the process of cleaning and organising the data is shown. In the third chapter, data analysis is performed in order to find conclusions to answer the research question. The last chapter gives a conclusion on our findings.
The dataset we are using contains information regarding the mobility of the residents of the Netherlands aged 6 or older in private households, so excluding residents of institutions and homes. The table contains per person per day /year an overview of the average number of trips, the average distance travelled and the average time travelled. These are regular trips on Dutch territory, including domestic holiday mobility. The distance travelled is based on stage information. Excluded in this dataset is mobility based on series of calls trips. The mobility behaviour is broken down by modes of travel, purposes of travel, population and region characteristics. The data used are retrieved from The Dutch National travel survey named Onderweg in Nederland (ODiN).
According to CBS the definition of trips is the part of a trip with a one mode of transport (2022)
Data available from: 2018 to 2021
Dataset we are using can be found here.
The dataset has columns describing the following things:
travel motives: for what purpose people have traveledpopulation: for this dataset we are looking at people older than 6 years oldtravel modes: what mode of transport people have chosen to travel withRegion characteristics: It is explained belowperiods: yearAverage per person per day/Trips (number): average number of trips per person per dayAverage per person per day/Distance travelled (passenger kilometres ): average distance travelled per person per day (km)Average per person per year/Trips (number): average number of trips per person per yearAverage per person per year/Distance travelled (passenger kilometres ): average distance travelled per person per year (km)Description about Region_characterstics column:
Urbanisation is classified on the basis of five categories of surrounding address density.
The dataset also contains data on entire provinces. While the first four paragraphs of the data anlysis chapter focus on region charasteristics defined as stated above, the fifth paragraph uses the province data to compare this to the previous results.
Before using our data for analysis and visualisation, we first need to clean and organize the data. This will be done in the blocks of code down below.
# Import all libraries we use
import pandas as pd
import seaborn as sns
import plotly.express as px
import numpy as np
import plotly.io as pio
import matplotlib.pyplot as plt
file_path = 'per_person__travel_modes__travel_purpose_12102022_104624.csv'
df = pd.read_csv(file_path, delimiter=';', encoding='Windows-1252')
df.head()
| "Travel motives" | Population | Travel modes | Margins | Region characteristics | Periods | Average per person per day/Trips (number) | Average per person per day/Distance travelled (passenger kilometres ) | Average per person per year/Trips (number) | Average per person per year/Distance travelled (passenger kilometres ) | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Total | Population 6 years or older | Total | Value | The Netherlands | 2018 | 2.78 | 36.16 | 1015 | 13200 |
| 1 | Total | Population 6 years or older | Total | Value | The Netherlands | 2019 | 2.71 | 36.00 | 989 | 13140 |
| 2 | Total | Population 6 years or older | Total | Value | The Netherlands | 2020 | 2.35 | 24.88 | 861 | 9105 |
| 3 | Total | Population 6 years or older | Total | Value | The Netherlands | 2021 | 2.51 | 27.24 | 915 | 9942 |
| 4 | Total | Population 6 years or older | Total | Value | Extremely urbanised | 2018 | 2.70 | 32.66 | 987 | 11922 |
Checking the names of columns to see if they need to be changed:
df.columns
Index(['"Travel motives"', 'Population', 'Travel modes', 'Margins',
'Region characteristics', 'Periods',
'Average per person per day/Trips (number)',
'Average per person per day/Distance travelled (passenger kilometres )',
'Average per person per year/Trips (number)',
'Average per person per year/Distance travelled (passenger kilometres )'],
dtype='object')
The common practice is to rename the columns to lower case and without white spaces:
df.rename(columns={'"Travel motives"': 'travel_motive', 'Population':'population', 'Travel modes':'travel_mode', 'Margins':'margine',
'Region characteristics':'region_characteristics', 'Periods':'year',
'Average per person per day/Trips (number)':'average_trips_per_person_per_day_number',
'Average per person per day/Distance travelled (passenger kilometres )':'average_trips_per_person_per_day_distance(km)',
'Average per person per year/Trips (number)':'average_trips_per_person_per_year_number',
'Average per person per year/Distance travelled (passenger kilometres )':'average_trips_per_person_per_year_distance(km)'}, inplace=True)
df.columns
Index(['travel_motive', 'population', 'travel_mode', 'margine',
'region_characteristics', 'year',
'average_trips_per_person_per_day_number',
'average_trips_per_person_per_day_distance(km)',
'average_trips_per_person_per_year_number',
'average_trips_per_person_per_year_distance(km)'],
dtype='object')
Get an overall view of the dataset and check the data types:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1008 entries, 0 to 1007 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 travel_motive 1008 non-null object 1 population 1008 non-null object 2 travel_mode 1008 non-null object 3 margine 1008 non-null object 4 region_characteristics 1008 non-null object 5 year 1008 non-null int64 6 average_trips_per_person_per_day_number 1008 non-null object 7 average_trips_per_person_per_day_distance(km) 1008 non-null object 8 average_trips_per_person_per_year_number 1008 non-null object 9 average_trips_per_person_per_year_distance(km) 1008 non-null object dtypes: int64(1), object(9) memory usage: 78.9+ KB
The dataset contains 1007 rows and 10 columns. It seems like the dataset contains no null values since the Non-Null column value is equal to number of rows. All of the columns are object type except for year column which is int.
Therefore, we must change the data type of the last four columns to int:
for c in df.columns[6:]:
df[c] = df[c].str.strip().str.replace(',', '').str.replace("'", "")
# dig deeper into the dataset
df['average_trips_per_person_per_day_number'].unique()
array(['2.78', '2.71', '2.35', '2.51', '2.70', '2.59', '2.14', '2.33',
'2.80', '2.74', '2.37', '2.52', '2.84', '2.61', '2.76', '2.44',
'2.67', '2.64', '2.41', '2.53', '0.96', '0.95', '0.81', '0.82',
'0.66', '0.63', '0.52', '0.54', '0.97', '0.80', '1.06', '0.93',
'1.14', '1.13', '0.98', '1.02', '1.18', '1.01', '1.00', '0.32',
'0.31', '0.24', '0.26', '0.23', '0.18', '0.20', '0.33', '0.25',
'0.27', '0.37', '0.28', '0.35', '0.08', '0.03', '0.13', '0.05',
'0.06', '0.09', '0.04', '0.07', '0.02', '.', '0.16', '0.01',
'0.79', '0.76', '0.64', '0.86', '0.65', '0.77', '0.68', '0.69',
'0.75', '0.71', '0.60', '0.61', '0.58', '0.53', '0.55', '0.44',
'0.43', '0.73', '0.46', '0.42', '0.62', '0.41', '0.49', '0.56',
'0.36', '0.34', '0.51', '0.50', '0.30', '0.38', '0.39', '0.19',
'0.12', '0.22', '0.21', '0.29', '0.14', '0.10', '0.15', '0.11',
'0.00', '0.59', '0.57', '0.48', '0.47', '0.17'], dtype=object)
Our missing values seem to be string in the form of '.' therefore below we will look into what rows and columns contain these missing values.
The travel_motive column is categorized into several motives where we have missing data points. As consulted before with the professor, if we sum up all the motives (showed in the Total category) and do not look at specific motives then we will solve the problem of missing values. In the output below there are three missing values in travel_motive column when it has the Total value.
df[df.isin(['.']).any(axis=1)]['travel_motive'].value_counts()
Professionally 85 Services/care 58 Shopping, groceries, funshopping. 29 Attending education/courses 24 Travel to/from work, (non)-daily commute 23 Total 3 Name: travel_motive, dtype: int64
Below, we print the three rows that have missing values to get a sense of the data.
Now we should look for a way to replace these missing values with data that makes sense. One solution is to look at urbanisation categories. For example, if the urbanisation state of one of the rows that has missing values is Not urbanised then it makes sense to repalce those missing values with the row from the same year which is Hardly urbanised.
df[df.isin(['.']).any(axis=1)][df['travel_motive'] == 'Total']
C:\Users\jplasmei\AppData\Local\Temp\ipykernel_22948\4059657549.py:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index. df[df.isin(['.']).any(axis=1)][df['travel_motive'] == 'Total']
| travel_motive | population | travel_mode | margine | region_characteristics | year | average_trips_per_person_per_day_number | average_trips_per_person_per_day_distance(km) | average_trips_per_person_per_year_number | average_trips_per_person_per_year_distance(km) | |
|---|---|---|---|---|---|---|---|---|---|---|
| 94 | Total | Population 6 years or older | Train | Value | Not urbanised | 2020 | . | . | . | . |
| 95 | Total | Population 6 years or older | Train | Value | Not urbanised | 2021 | . | . | . | . |
| 118 | Total | Population 6 years or older | Bus/metro | Value | Not urbanised | 2020 | . | . | . | . |
Replacing missing values of row 94 which has Not urbanised category and is from 2020 with row 90 which is from the same year, travel mode and Hardly urbanised category. Using the same line of reasoning we replace the other two rows as well.
df.iloc[94,6:10] = df.iloc[90,6:10]
df.iloc[95,6:10] = df.iloc[91,6:10]
df.iloc[118,6:10] = df.iloc[114,6:10]
Now we proceed to filter the travel_motive column. We only keep the rows that have Total as travel_motive.
df = df[df['travel_motive']=='Total']
df.iloc[:,6:10] = df.iloc[:,6:10].astype(float)
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 168 entries, 0 to 167 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 travel_motive 168 non-null object 1 population 168 non-null object 2 travel_mode 168 non-null object 3 margine 168 non-null object 4 region_characteristics 168 non-null object 5 year 168 non-null int64 6 average_trips_per_person_per_day_number 168 non-null float64 7 average_trips_per_person_per_day_distance(km) 168 non-null float64 8 average_trips_per_person_per_year_number 168 non-null float64 9 average_trips_per_person_per_year_distance(km) 168 non-null float64 dtypes: float64(4), int64(1), object(5) memory usage: 14.4+ KB
Below we delete the unnecessary columns population and margine since they have preditermined values that are explained below the Data Used heading.
print(df['population'].unique())
print(df['margine'].unique())
['Population 6 years or older'] ['Value']
df.drop(['population', 'margine'], axis=1, inplace=True)
df.columns
Index(['travel_motive', 'travel_mode', 'region_characteristics', 'year',
'average_trips_per_person_per_day_number',
'average_trips_per_person_per_day_distance(km)',
'average_trips_per_person_per_year_number',
'average_trips_per_person_per_year_distance(km)'],
dtype='object')
The dataframe is now ready to be used for analysis. This chapter focus is on analyzing the data and finding an answer to the research question.
In the first paragraph the relation between different urbanisation grades and general travel behavior for the year 2018 is analysed. The second paragraph highlights the relation between urbanisation grades and use of different modes of transport, also in 2018. The third paragraph now researches the development of general travel behavior over time for different urbanisation grades. The fourth paragraph gives a summary of the previous paragraphs and shows a complete visualisation to answer the research question.
The conclusions found in paragraph four cover urbanization grades that are defined per square kilometer. The fifth paragraph tries to find out to what extend these findings also hold for the general travel behaviour of entire provinces.
The following bar chart is showing the average amount of trips made per person per day in different grades of urbanization. For a clear overview, the research only focusses on the average trips made without a distinction in travel modes. It is expected that the amount of trips made in more urbanised areas will be higher, because the activities are closer to peoples homes and therefore the resistance for a person to make a trip is smaller.
#Filtering the data to only look at data from 2018
df_2018 = df.query('year==2018')
df_total = df_2018[df_2018['travel_mode'] == 'Total']
#Creating bar chart
fig = px.bar(df_total, x='region_characteristics', y='average_trips_per_person_per_day_number',
title="Amount of trips made per person per day in different grades of urbanizations",
text='average_trips_per_person_per_day_number')
#Zoom in to improve readability
fig.update_layout(yaxis_range=[2,3])
#Update titles x- and y-axis
fig.update_layout(xaxis_title="Urbanisation grade",
yaxis_title="Average amount of trips per person per day")
#Show final barchart
fig.show()
The graph does partly meet our expections. The amount of trips made in not urbanised areas are less than the trips made in all other urbanization grades. However, as can be obtained from the chart, the amount of trips made in strongly, moderately en hardly urbanized areas are higher than in extremely urbanized areas, which does not meet the expectation from above.
The following bar chart is showing the average distance traveled per person per day in different grades of urbanizations. In this analysis the focus is only on the average distance without a distinction in travel modes, because that gives a more clear overview. It is expected that the average distance travelled in more urbanised areas will be lower, because activities are closer to people homes and therefore people have tot travel less distance to their activity.
#Creating bar chart
fig = px.bar(df_total, x='region_characteristics', y='average_trips_per_person_per_day_distance(km)',
title="Average distance traveled per person per day in different grades of urbanizations",
text='average_trips_per_person_per_day_distance(km)')
#Zoom in to improve readability
fig.update_layout(yaxis_range=[30,40])
#Update titles x- and y-axis
fig.update_layout(xaxis_title="Urbanisation grade", yaxis_title="Average distance traveled per person per day(km)")
#Show final barchart
fig.show()
The graph does partly meet the expectations. From the bar chart, it can be obtained that the average distance travelled becomes higher while the urbanization grade becomes less. However, the distance travelled in not urbanised areas is less than the distance travelled in hardly urbanized areas, which does not meet our expectations.
In this research the relationship between urbanisation grades and usage of different modes of travel also gets analysed. In order to do this two sets of pie charts have been created.
The first set of pie charts highlights the relationship between urbanisation grade and average distance traveled per person per day for different modes of transport.
Before analysing the data some significant differences in transport use between urbanisation grades are expected. The following enumeration gives an overview of the expectations:
textding = 'average_trips_per_person_per_day_distance(km)'
#Import
import plotly.graph_objects as go
from plotly.subplots import make_subplots
#Exclude data on all the modes of transport combined
df2 = df[df['travel_mode']!="Total"]
#Only use data from 2018
df2018 = df2[df2['year']==2018]
#Prepare dataframes for each level of urbanization
dfNu = df2018[df2018["region_characteristics"]=="Not urbanised"]
dfHu = df2018[df2018["region_characteristics"]=="Hardly urbanised"]
dfMu = df2018[df2018["region_characteristics"]=="Moderately urbanised"]
dfSu = df2018[df2018["region_characteristics"]=="Strongly urbanised"]
dfEu = df2018[df2018["region_characteristics"]=="Extremely urbanised"]
#Set labels
labels = dfNu["travel_mode"]
#Create subplot frame
fig = make_subplots(rows=2, cols=3, specs=[[{'type':'domain'}, {'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}, {'type':'domain'}]],
subplot_titles=["Not urbanised", "Hardly urbanised", "Moderately urbanised", "Strongly urbanised", "Extremely urbanised"])
#Add each subplot
fig.add_trace(go.Pie(labels=labels, values=dfNu[textding]),
1, 1)
fig.add_trace(go.Pie(labels=labels, values=dfHu[textding]),
1, 2)
fig.add_trace(go.Pie(labels=labels, values=dfMu[textding]),
1, 3)
fig.add_trace(go.Pie(labels=labels, values=dfSu[textding]),
2, 1)
fig.add_trace(go.Pie(labels=labels, values=dfEu[textding]),
2, 2)
fig.update_layout(title_text="Average distance traveled per person per day for different modes of transport")
#Create hole and add label + percentage as hover data
fig.update_traces(hole=.4, hoverinfo="label+value")
#Show total subplot frame
fig.show()
The results of the first pie chart visualisation can be seen directly above. The size of the coloured part indicates to what extent the particular mode of transport contributes towards the total average distance traveled per person per day. Hovering your mouse above the area shows how much kilometers the average distance per person per day is for that mode of transport.
Most of the hypothesis were right, except the prediction about the bycicle usage. The results show that the more urbanised an area is, the more distance is traveled by bycicle. While the average distance per trip using a bycicle is probably higher in non-urbanised areas, the amount of trips in urbanised areas apparently makes up for it. This can be verified in the next set of pie charts, indicating the amount of trips for different modes of transport.
Another interesting result is the significant increase in distance traveled by train in more urbanised regions. While the train only makes up 5.47% of the travel distance in non-urbanised regions, it makes up an astonishing 20.1% of the travel distance in extremely urbanised regions.
The second set highligts the relationship between urbanization grade and average amount of trips per person per day for different modes of transport.
Before analysing the data the following results about the differences in transport use between urbanisation grades are expected:
textding = 'average_trips_per_person_per_day_number'
#Exclude data on all the modes of transport combined
df2 = df[df['travel_mode']!="Total"]
#Only use data from 2018
df2018 = df2[df2['year']==2018]
#Prepare dataframes for each level of urbanization
dfNu = df2018[df2018["region_characteristics"]=="Not urbanised"]
dfHu = df2018[df2018["region_characteristics"]=="Hardly urbanised"]
dfMu = df2018[df2018["region_characteristics"]=="Moderately urbanised"]
dfSu = df2018[df2018["region_characteristics"]=="Strongly urbanised"]
dfEu = df2018[df2018["region_characteristics"]=="Extremely urbanised"]
#Set labels
labels = dfNu["travel_mode"]
#Create subplot frame
fig = make_subplots(rows=2, cols=3, specs=[[{'type':'domain'}, {'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}, {'type':'domain'}]],
subplot_titles=["Not urbanised", "Hardly urbanised", "Moderately urbanised", "Strongly urbanised", "Extremely urbanised"])
#Add each subplot
fig.add_trace(go.Pie(labels=labels, values=dfNu[textding]),
1, 1)
fig.add_trace(go.Pie(labels=labels, values=dfHu[textding]),
1, 2)
fig.add_trace(go.Pie(labels=labels, values=dfMu[textding]),
1, 3)
fig.add_trace(go.Pie(labels=labels, values=dfSu[textding]),
2, 1)
fig.add_trace(go.Pie(labels=labels, values=dfEu[textding]),
2, 2)
fig.update_layout(title_text="Average amount of trips per person per day for different modes of transport")
#Create hole and add label + percentage as hover data
fig.update_traces(hole=.4, hoverinfo="label+value")
#Show total subplot frame
fig.show()
The results of the second pie chart visualisation can be seen directly above. The size of the coloured part indicates to what extent the particular mode of transport contributes towards the total amount of trips traveled per person per day. Hovering your mouse above the area shows how much trips the average person per day makes using that mode of transport.
In the case of this subquestion, all our hypothesis were right. For every mode of transport, the percentage of trips per day increases, at the expense of traveling by car. Interesting to note is that the percentage of people traveling car as a driver decreases more rapidly compared to the percentage of people traveling by car as a passenger. This indicates that people in more urbanised regions are more likely to use the car together compared to less urbanised regions.
In the previous visualisations we have explored the relation between grade of urbanisation and travel behaviour in the year 2018. To get a better insight in the general relation, it is also interesting to research how these relations develop over the years. The dataset we used contains data from 2018 to 2021, of which the last two years were affected by the coronavirus. The next visualisations explore the relation between urbanisation and travel behaviour from 2018 to 2021, thereby also providing insight into the effect of the corona virus.
The line graph below shows the average trips per person per day over time for different urbanisation grades.
There is expected to see a decrease in average trips per person per day in the year 2020 and 2021 due to the corona virus. It is expected that the different urbanisation grades to follow the same pattern.
df2 = df[df["travel_mode"]=="Total"]
fig = px.line(df2, x="year", y="average_trips_per_person_per_day_number", color="region_characteristics",
title='Average trips per person per day over time for different urbanization grades', markers=True)
dfTot = df2[df2["region_characteristics"]=="The Netherlands"]
fig.update_xaxes(nticks = len(dfTot["year"]))
fig.update_layout(xaxis_title="Year", yaxis_title="Average trips per person per day", legend_title="Urbanisation grade")
fig.show()
The results of the visualisation can be seen directly above. Hovering over the data points shows the average trips per person per day at that point in the graph.
The first part of our hypothesis was correct: The number of trips decreases in the years 2020 and 2021. The second part of our hypothesis is not entirely correct however. The extremely urbanised areas seem to be most affected by the coronavirus while the not urbanised areas seem to be least affected.
The line graph below shows the average distance traveled per person per day over time for different urbanisation grades.
We expect to see a decrease in average travel distance per person per day in the year 2020 and 2021 due to the corona virus. We expect the different urbanisation grades to follow the same pattern.
df2 = df[df["travel_mode"]=="Total"]
fig = px.line(df2, x="year", y="average_trips_per_person_per_day_distance(km)", color="region_characteristics",
title='Average distance traveled per person per day over time for different urbanization grades', markers=True)
dfTot = df2[df2["region_characteristics"]=="The Netherlands"]
fig.update_xaxes(nticks = len(dfTot["year"]))
fig.update_layout(xaxis_title="Year", yaxis_title="Average distance traveled per person per day(km)", legend_title="Urbanisation grade")
fig.show()
The results of the visualisation can be seen directly above. Hovering over the data points shows the average distance traveled per person per day at that point in the graph.
For this visualisation, our hypothesis was correct. The average travel distance decreases during the corona years, and all urbanisation grades seem to follow the same pattern. Interesting to note is that the average travel distance in the non urbanised areas was rising rapidly before the introduction of the coronavirus. The previous graph shows that the average amount of trips was decreasing in the non urbanised areas for the same period.
The following bar charts provide a summary of all the data research we did before. In both figures, the slider can be used to manually compare data from different years over the period 2018-2021, while the play button can be used to automatically show the data of every single year. It is especially interesting to compare the data of the years 2019 and 2020, because those provide good insight in the effects of COVID-19.
In both figures, the y-axis represents the different grades of urbanization. In the first figure, the x-axis represents the average amount of trips (per person per day), while the x-axis of the second figure represents the average distance travelled (per person per day). Modes of travel can easily be selected and deselected by a single click on a travel mode from the legend of the corresponding figure. Moreover, in every single bar of both figures the ratio between the different modes of travel can be obtained.
#Exclude average data for region characteristics and modes of travel
df_cleaned = df[(df['region_characteristics']!='The Netherlands') & (df['travel_mode']!='Total')]
#Create bar chart
fig = px.bar(df_cleaned, x='average_trips_per_person_per_day_number',
y='region_characteristics', color='travel_mode',
animation_frame='year',
title="Average number of trips per urbanization grade over the period 2018-2021",
orientation = 'h')
#Set correct range to improve readability
fig.update_layout(xaxis_range=[0,3])
#Update titles
fig.update_layout(xaxis_title="Distance travelled per person per day (km)",
yaxis_title="Urbanisation grade", legend_title="Travel Modes")
#Show final barchart
fig.show()
#Create bar chart
fig = px.bar(df_cleaned, x='average_trips_per_person_per_day_distance(km)',
y='region_characteristics', color='travel_mode',
animation_frame='year',
title="Distance travelled per urbanization grade over the period 2018-2021",
orientation = 'h')
#Set correct range to improve readability
fig.update_layout(xaxis_range=[0,40])
#Update titles
fig.update_layout(xaxis_title="Distance travelled per person per day (km)",
yaxis_title="Urbanisation grade", legend_title="Travel Modes")
#Show final barchart
fig.show()
The conclusions in the previous paragraphs covered the relationship between travel behavior and urbanization grade for areas of a square kilometer. To find out if these conclusions still hold when looking at entire provinces, we compare the conclusions to the data on provinces from the same dataset. Before starting with comparison analysis a new dataset about the amount of adresses per province is imported, see the coding below.
file_path = 'nederlandinwoners2.csv'
dfnl = pd.read_csv(file_path, delimiter=';', encoding='Windows-1252')
dfnl.head()
| province | year | inhabitants_per_km² | adresses | province_area | address_per_km2 | |
|---|---|---|---|---|---|---|
| 0 | Groningen | 2018 | 251 | 335605,00 | 2323,94 | 144,412076 |
| 1 | Fryslan | 2018 | 194 | 372384,00 | 3335,62 | 111,6386159 |
| 2 | Drenthe | 2018 | 187 | 271670,00 | 2632,65 | 103,1926006 |
| 3 | Overijssel | 2018 | 347 | 611715,00 | 3319,00 | 184,3070202 |
| 4 | Flevoland | 2018 | 292 | 210795,00 | 1411,63 | 149,3273733 |
The columns of the dataset get renamed to more easy and understanding names.
dfnl.rename(columns={'province':'provinces', 'inhabitants_per_km²':'inhabitants_per_km2'}, inplace = True)
for column in dfnl.columns[3:]:
dfnl[column] = dfnl[column].str.replace(',','.').astype(float)
The following diagram show the categorisation of the provinces per urbanisation grade. This diagram shows that the provinces Zuid-Holland, Noord-Holland and Utrecht are the most urbanised provinces. On the other hand the provinces Zeeland, Fryslan and Drenthe are the least urbanised areas. With this diagram the different provinces can be categorised in three subcategories:
These color codes will be used for the next graph as well.
fig = px.bar(dfnl, x='provinces', y='address_per_km2',
title='urbanisation of provinces based on number of adresses per km2', color='provinces',
color_discrete_map={'Groningen':'seaGreen', 'Fryslan':'seaGreen', 'Drenthe':'seaGreen',
'Overijssel':'seaGreen', 'Flevoland':'seaGreen', 'Gelderland':'royalBlue',
'Utrecht':'indianRed', 'Noord-Holland':'indianRed', 'Zuid-Holland':'indianRed',
'Zeeland':'seaGreen', 'Noord-Brabant':'royalBlue', 'Limburg':'royalBlue'})
fig.update_xaxes(title='Provinces', tickangle=-45)
fig.update_yaxes(title='Number of adresses per km2')
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.show()
file_path = 'nederlandprovincies4.csv'
df2 = pd.read_csv(file_path, delimiter=';', encoding='Windows-1252')
df2.head()
| ID | TravelMotives | Population | TravelModes | Margins | province | Periods | Trips_1 | DistanceTravelled_2 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 20 | T001080 | A048710 | T001093 | MW00000 | Groningen | 2018 | 2.81 | 37.46 |
| 1 | 21 | T001080 | A048710 | T001093 | MW00000 | Groningen | 2019 | 2.63 | 39.76 |
| 2 | 22 | T001080 | A048710 | T001093 | MW00000 | Groningen | 2020 | 2.27 | 26.74 |
| 3 | 23 | T001080 | A048710 | T001093 | MW00000 | Groningen | 2021 | 2.41 | 30.47 |
| 4 | 24 | T001080 | A048710 | T001093 | MW00000 | Fryslan | 2018 | 2.70 | 42.60 |
# renaming the columns for more clarity
df2.rename(columns={'Periods':'year', 'Trips_1':'trips', 'DistanceTravelled_2':'distance_travelled'}, inplace=True)
#showing the distance travelled
df2018 = df2[df2['year']==2018]
fig = px.bar(df2018, x='province', y='distance_travelled',
title="Average distance traveled per person per day in different provinces", text = 'distance_travelled', color='province',
color_discrete_map={'Groningen':'seaGreen', 'Fryslan':'seaGreen', 'Drenthe':'seaGreen',
'Overijssel':'seaGreen', 'Flevoland':'seaGreen', 'Gelderland':'royalBlue',
'Utrecht':'indianRed', 'Noord-Holland':'indianRed', 'Zuid-Holland':'indianRed',
'Zeeland':'seaGreen', 'Noord-Brabant':'royalBlue', 'Limburg':'royalBlue'})
fig.update_layout(xaxis={'categoryorder':'total descending'}, yaxis_range=[30,45])
fig.update_xaxes(title='Provinces')
fig.update_yaxes(title='DistanceTravelled_km')
fig.show()
This plot shows the provinces Drenthe, Flevoland and Fryslan have the most average distance travelled per person per day. The provinces with the least average distance travelled per day are Zuid-Holland, Limburg, Zeeland. As expected the average distance travelled becomes higher, when the urbanisation grades become less. Zeeland and Limburg are the least urbanised areas and their travel distance is less than the hardly urbanised areas like Zuid-Holland and Noord-Holland. These results are the same as found in paragraph 1.
df2018 = df2[df2['year']==2018]
fig = px.bar(df2018, x='province', y='trips',
title="Average distance traveled per person per day in different provinces", text = 'trips', color='province',
color_discrete_map={'Groningen':'seaGreen', 'Fryslan':'seaGreen', 'Drenthe':'seaGreen',
'Overijssel':'seaGreen', 'Flevoland':'seaGreen', 'Gelderland':'royalBlue',
'Utrecht':'indianRed', 'Noord-Holland':'indianRed', 'Zuid-Holland':'indianRed',
'Zeeland':'seaGreen', 'Noord-Brabant':'royalBlue', 'Limburg':'royalBlue'})
fig.update_layout(xaxis={'categoryorder':'total descending'}, yaxis_range=[2.5,3])
fig.update_xaxes(title='Provinces')
fig.update_yaxes(title='Amount of trips per person per day')
fig.show()
In this diagram the average amount of trips per person per day is presented. The most urbanised provinces Zuid-Holland and Noord-Holland with the exception of Utrecht have a low number of trips per day. Furthermore the moderately urbanised areas don’t have the most number of trips. In conclusion the average number of trips per province doesn’t have the same relation as the urbanisation categories from the first chapter.
The research results in the following conclusions.
The less urbanised an area is, the more trips and the longer distance residents travel on average. However, areas that are completely not urbanised seem to be an exception. Residents of these areas make very few trips and travel less distance compared to areas with a higher urbanisation grade.
A higher urbanisation grade seems to have the following effects on the use of different transport modes for residents of the area:
The frequency and distance of travel decreased during the years 2020 and 2021 due to the coronavirus. This decrease was most significant in more urbanised areas, while less urbanised areas were not affected heavily.
The relation between average distance traveled and grade of urbanisation seems to follow the same patterns when looked at per province, compared to when looked at per square kilometre. The same is not true for the amount of trips: The relation to grade of urbanisation does not seem the same.